Fusion of Loops for Parallelism and Locality
نویسندگان
چکیده
Loop fusion improves data locality and reduces synchronization in data-parallel applications. However, loop fusion is not always legal. Even when legal, fusion may introduce loop-carried dependences which reduce parallelism. In addition, performance losses result from cache conflicts in fused loops. We present new, systematic techniques which: (1) allow fusion of loop nests in the presence of fusion-preventing dependences, (2) allow parallel execution of fused loops with minimal synchronization, and (3) eliminate cache conflicts in fused loops. We evaluate our techniques on a 56-processor KSR2 multiprocessor, and show improvements of up to 20% for representative loop nest sequences. The results also indicate a performance tradeoff as more processors are used, suggesting careful evaluation of the profitability of fusion.
منابع مشابه
Maximizing Loop
Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...
متن کاملMaximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Loop fusion is a program transformation that merges multiple loops into one. It is eeective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that ...
متن کاملPolynomial-Time Nested Loop Fusion with Full Parallelism
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependences exist in nested loops, or can not achieve good parallelism after fu...
متن کاملWith-Loop Fusion for Data Locality and Parallelism
Abstract. With-loops are versatile array comprehensions used in the functional array language SaC to implement universally applicable array operations. We describe the fusion of with-loops as a novel optimization technique to improve the data locality of compiled code. Experiments based on selected benchmark programs show the significance of withloop fusion for achieving competitive runtime per...
متن کاملEfficient Polynomial-Time Nested Loop Fusion with Full Parallelism
Data locality and synchronization overhead are two important factors that affect the performance of applications on multiprocessors. Loop fusion is an effective way for reducing synchronization and improving data locality. Traditional fusion techniques, however, either can not address the case when fusion-preventing dependencies exist in nested loops, or can not achieve good parallelism after f...
متن کامل